Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.
translated by 谷歌翻译
本文研究了具有完全状态观测的自主交换线性系统系统识别问题。我们提出了用于识别切换线性系统的开关最小二乘法,表明该方法是强烈一致的,并导出数据相关和数据无关的收敛速率。特别是,我们的数据依赖率的收敛速度表明,几乎肯定地,系统识别错误是$ \ mathcal {o} \ big(\ sqrt {\ log(t)/ t}大)$ why $ t $时间地平线。这些结果表明,我们对切换线性系统的方法具有相同的收敛速度,不是非切换线性系统的最小二乘法。我们将我们的结果与文学中的结果进行比较。我们提供了数值例子以说明所提出的系统识别方法的性能。
translated by 谷歌翻译
在本文中,我们考虑在具有多个半自治机器人的系统中分配人类运营商的问题。每个机器人都需要执行独立的任务序列,经历了一次失败并在每个任务时陷入故障状态的可能性。如果需要,人类运营商可以帮助或漫游机器人。传统的MDP技术用于解决这些问题的面临可扩展性问题,因为具有机器人和运营商的数量的状态和行动空间的指数增长。在本文中,我们推出了操作员分配问题可转向的条件,从而实现了削弱指数启发式的使用。可以容易地检查条件以验证可索引性,我们表明他们持有广泛的兴趣问题。我们的主要洞察力是利用各个机器人的价值函数的结构,从而导致可以针对每个机器人的每个状态分开验证的条件。我们将这些条件应用于远程机器人监控系统中常见的两种转换。通过数值模拟,我们展示了削减指数政策作为近乎最佳和可扩展方法的功效,以实现现有的可扩展方法。
translated by 谷歌翻译
我们重新审视汤普森采样算法以控制Ouyang等人最近提出的未知线性二次(LQ)系统(Arxiv:1709.04047)。该算法的遗憾是根据封闭环系统的诱导规范的技术假设得出的。在此技术说明中,我们表明,通过在算法中进行较小的修改(特别是确保发作不会太早结束),可以根据光谱来代替诱发规范的技术假设。闭环系统的半径。修改后的算法与$ \ tilde {\ Mathcal {o}}(\ sqrt {t})$具有相同的贝叶斯遗憾,其中$ t $是Time-Horizon和$ \ tilde {\ tilde {\ Mathcal {o}}(O}}(O}}(O}}(O}}(O}))(\ cdot)$ note法将对数术语隐藏在〜$ t $中。
translated by 谷歌翻译
考虑了具有主要代理和次要代理的分散式二次系统。主要代理会影响未成年人,但反之亦然。所有试剂都观察到了主要药物的状态。此外,未成年人对当地状态有嘈杂的观察。噪声过程是\ emph {not}是高斯。表征了最佳策略和最佳线性策略的结构。结果表明,主要代理的最佳控制动作是系统状态的主要代理MMSE(最小平方误差)的线性函数,而次要代理的最佳控制动作是主要代理的MMSE MMSE估算的线性函数以及一个“更正术语”,取决于未成年人对本地国家的MMSE估计以及主要代理商对未成年人当地国家的MMSE估算的差异。由于噪声是非高斯的,因此次要药物的MMSE估计是其观察的非线性函数。结果表明,替换次要代理的MMSE估计值(线性最小平方)估计值提供了最佳的线性控制策略。结果是使用基于条件独立性,基于共同信息的状态和控制措施的直接方法证明了结果,并根据条件独立性,正交性原理和正方形的完成来简化每步成本。
translated by 谷歌翻译
The paper presents a cross-domain review analysis on four popular review datasets: Amazon, Yelp, Steam, IMDb. The analysis is performed using Hadoop and Spark, which allows for efficient and scalable processing of large datasets. By examining close to 12 million reviews from these four online forums, we hope to uncover interesting trends in sales and customer sentiment over the years. Our analysis will include a study of the number of reviews and their distribution over time, as well as an examination of the relationship between various review attributes such as upvotes, creation time, rating, and sentiment. By comparing the reviews across different domains, we hope to gain insight into the factors that drive customer satisfaction and engagement in different product categories.
translated by 谷歌翻译
Automated offensive language detection is essential in combating the spread of hate speech, particularly in social media. This paper describes our work on Offensive Language Identification in low resource Indic language Marathi. The problem is formulated as a text classification task to identify a tweet as offensive or non-offensive. We evaluate different mono-lingual and multi-lingual BERT models on this classification task, focusing on BERT models pre-trained with social media datasets. We compare the performance of MuRIL, MahaTweetBERT, MahaTweetBERT-Hateful, and MahaBERT on the HASOC 2022 test set. We also explore external data augmentation from other existing Marathi hate speech corpus HASOC 2021 and L3Cube-MahaHate. The MahaTweetBERT, a BERT model, pre-trained on Marathi tweets when fine-tuned on the combined dataset (HASOC 2021 + HASOC 2022 + MahaHate), outperforms all models with an F1 score of 98.43 on the HASOC 2022 test set. With this, we also provide a new state-of-the-art result on HASOC 2022 / MOLD v2 test set.
translated by 谷歌翻译
We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come together as the users often contribute data repeatedly and multiple queries are made. We provide an algorithm that outputs a mean estimate at every time instant $t$ such that the overall release is user-level $\varepsilon$-DP and has the following error guarantee: Denoting by $M_t$ the maximum number of samples contributed by a user, as long as $\tilde{\Omega}(1/\varepsilon)$ users have $M_t/2$ samples each, the error at time $t$ is $\tilde{O}(1/\sqrt{t}+\sqrt{M}_t/t\varepsilon)$. This is a universal error guarantee which is valid for all arrival patterns of the users. Furthermore, it (almost) matches the existing lower bounds for the single-release setting at all time instants when users have contributed equal number of samples.
translated by 谷歌翻译
Speech-centric machine learning systems have revolutionized many leading domains ranging from transportation and healthcare to education and defense, profoundly changing how people live, work, and interact with each other. However, recent studies have demonstrated that many speech-centric ML systems may need to be considered more trustworthy for broader deployment. Specifically, concerns over privacy breaches, discriminating performance, and vulnerability to adversarial attacks have all been discovered in ML research fields. In order to address the above challenges and risks, a significant number of efforts have been made to ensure these ML systems are trustworthy, especially private, safe, and fair. In this paper, we conduct the first comprehensive survey on speech-centric trustworthy ML topics related to privacy, safety, and fairness. In addition to serving as a summary report for the research community, we point out several promising future research directions to inspire the researchers who wish to explore further in this area.
translated by 谷歌翻译
The automated synthesis of correct-by-construction Boolean functions from logical specifications is known as the Boolean Functional Synthesis (BFS) problem. BFS has many application areas that range from software engineering to circuit design. In this paper, we introduce a tool BNSynth, that is the first to solve the BFS problem under a given bound on the solution space. Bounding the solution space induces the synthesis of smaller functions that benefit resource constrained areas such as circuit design. BNSynth uses a counter-example guided, neural approach to solve the bounded BFS problem. Initial results show promise in synthesizing smaller solutions; we observe at least \textbf{3.2X} (and up to \textbf{24X}) improvement in the reduction of solution size on average, as compared to state of the art tools on our benchmarks. BNSynth is available on GitHub under an open source license.
translated by 谷歌翻译